online content change
Staying up to Date with Online Content Changes Using Reinforcement Learning for Scheduling
From traditional Web search engines to virtual assistants and Web accelerators, services that rely on online information need to continually keep track of remote content changes by explicitly requesting content updates from remote sources (e.g., web pages). We propose a novel optimization objective for this setting that has several practically desirable properties, and efficient algorithms for it with optimality guarantees even in the face of mixed content change observability and initially unknown change model parameters. Experiments on 18.5M URLs crawled daily for 14 weeks show significant advantages of this approach over prior art.
Staying up to Date with Online Content Changes Using Reinforcement Learning for Scheduling
From traditional Web search engines to virtual assistants and Web accelerators, services that rely on online information need to continually keep track of remote content changes by explicitly requesting content updates from remote sources (e.g., web pages). We propose a novel optimization objective for this setting that has several practically desirable properties, and efficient algorithms for it with optimality guarantees even in the face of mixed content change observability and initially unknown change model parameters. Experiments on 18.5M URLs crawled daily for 14 weeks show significant advantages of this approach over prior art.
Reviews: Staying up to Date with Online Content Changes Using Reinforcement Learning for Scheduling
The paper formulates the problem of optimizing a strategy for crawling remote contents to track their changes as an optimization problem called the freshness crawl scheduling problem. This problem is an obviously important problem in applications like Internet search engine, and the presented formulation seems to give a practical solution to those applications. The paper presents an algorithm for solving the freshness crawl scheduling problem to optimality, assuming that the contents change rates are known. The idea behind the algorithm is based on the deep understanding of statistics and continuous optimization, and it seems to me that the contribution is solid (although I could not very all the technical details). For the case where the contents change rates are not known, a reinforcement learning algorithm is presented.
Staying up to Date with Online Content Changes Using Reinforcement Learning for Scheduling
From traditional Web search engines to virtual assistants and Web accelerators, services that rely on online information need to continually keep track of remote content changes by explicitly requesting content updates from remote sources (e.g., web pages). We propose a novel optimization objective for this setting that has several practically desirable properties, and efficient algorithms for it with optimality guarantees even in the face of mixed content change observability and initially unknown change model parameters. Experiments on 18.5M URLs crawled daily for 14 weeks show significant advantages of this approach over prior art.
Staying up to Date with Online Content Changes Using Reinforcement Learning for Scheduling
Kolobov, Andrey, Peres, Yuval, Lu, Cheng, Horvitz, Eric J.
From traditional Web search engines to virtual assistants and Web accelerators, services that rely on online information need to continually keep track of remote content changes by explicitly requesting content updates from remote sources (e.g., web pages). We propose a novel optimization objective for this setting that has several practically desirable properties, and efficient algorithms for it with optimality guarantees even in the face of mixed content change observability and initially unknown change model parameters. Experiments on 18.5M URLs crawled daily for 14 weeks show significant advantages of this approach over prior art. Papers published at the Neural Information Processing Systems Conference.